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We discuss a general method to construct correlated binomial distributions by imposing several 
consistent relations on the joint probability function. We obtain self-consistency relations for the 
conditional correlations and conditional probabilities. The beta-binomial distribution is derived 
by a strong symmetric assumption on the conditional correlations. Our derivation clarifies the 
'correlation' structure of the beta-binomial distribution. It is also possible to study the correlation 
structures of other probability distributions of exchangeable (homogeneous) correlated Bernoulli 
random variables. We study some distribution functions and discuss their behaviors in terms of 
their correlation structures. 

PACS numbers: 02.50.Cw 



I. INTRODUCTION 



Incorporation of correlation p into Bernoulli random variables Xi(i = 1, 2, • • • N) taking the value 1 with probability 
p and taking the value with probability 1—p has long history and have been widely discussed in a variety of areas of 
science, mathematics and engineering. Writing the expectation value of a random variable A as < A >, the correlation 
p between Xi and Xj is defined as 

p = CorrfJQjX;) = . (1) 

' y/< Xi > (1- < Xi >) < Xj > (1- < Xj >) 

If there are no correlation between the random variables, the number n of the variables taking the value 1 obeys 
the binomial probability distribution b(N,p). The necessity of the correlation p comes from the facts that there are 
many phenomena where dependency structures in the random events are crucial or are necessary for the explanation 
of experimental data. 

For example, in biometrics, the teratogenic or toxicological effect of certain compounds was studied [HIS. El- The 
interest resides in the number of affected fetuses or implantation in a litter. One parameter models, such as the Poisson 
distribution and binomial distributions provided poor fits to the experimental data. A two-parameter alternative to 
the above distributions, beta-binomial distribution (BBD), has been proposed 0,0- In the model, the probability p' 
of the binomial distribution b(N,p') is also a random variable and obeys the beta distribution Be(a,/3). 

v 'a-l(i _ V >)P-1 

B(a,/3) 

The resulting distribution has probability function 

p, \ r B(a + n,N + [3-n) 

P{n) = N C n — — . (3) 
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The mean p and variance a 2 of the BBD are 

/i = Np and a 2 = Npq(l + N9)/{1 + 9) (4) 

where 

a [3 1 

P= — —5 , q=l-p= — — - and 9= — — . 5 

a + p a + p a+ p 

9 is a measure of the variation in p' and is called as "correlation level" p|. The case of pure binomial distribution 
corresponds to 9 = 0. However, true "correlation" of the BBD is given as 

P= * t (6) 
a + p + 1 

The derivation of the relation is straightforward. If we denote the sum of X, as S = YaLiXu 

we can write as 

< XiXj >=< S 2 - S > /N(N - 1) and < X t >=< Xj >=< S > /N. From eq.(HJ and the results for BBD, we 
obtain eq.©. We rewrite the variance a 2 as 

a 2 = Npq + N(N - l)pq ■ p. (7) 

In the area of computer engineering, in the context of the design of survivable storage system, the modeling of the 
correlated failures among storage nodes is a hot topic Q. In addition to BBD, a correlated binomial model based 
on conditional failure probabilities has been proposed. The same kind of correlated binomial distribution based on 
conditional probabilities has also been introduced in financial engineering. There, credit portfolio modeling has been 
extensively studied [HQ. In particular, the modeling default correlation plays central role in the pricing of portfolio 
credit derivatives, which are developed in order to manage the risk of joint default or the clustering of default. As a 
default distribution model for homogeneous (exchangeable) credit portfolio where the assets' default probabilities and 
default correlations are uniform and denoted as p and p, Witt has introduced a correlated binomial model based on the 
conditional default probabilities p n 0. Describing the defaulted (non-defaulted) state of i-th asset by Xi = 1 (X = 0) 
and the joint default probability function by P(x\, X2, ■ ■ ■ ,xn),Pu are defined as 

n 

p n =<X n+x \ JJ X n , = 1 > . (8) 

n' = l 

Here < A\B > means the expectation value of a random variable A under the condition that B is satisfied. The 
expectation value of Xi signifies the default probability and the condition n«'=i ^n' = 1 corresponds to the situation 
where the first n assets among N are defaulted, pa — p and from the homogeneity (exchangeability ) assumption, 
any n assets among N can be chosen in the n default condition rin'=i ^n' = 1- X n+ \ in eq.JHJ is also substituted by 
anyone which is not used in the n default condition. 

In order to fix the joint default probability function completely, it is necessary to impose N conditions on them 
from the homogeneity assumption. Witt and the authors have imposed the following condition on the conditional 
correlations 



Can(X n+ i,X n+2 \ X n > = 1) = pcxp(-An) = p r , 



Here Corr(^4, B\C) means the correlation between the random variable A and B under the condition C is satisfied. 
From them, recursive relations for p n are obtained and p n are calculated as 



n'=0 

The joint default probability function and the default distribution function P/v(n) has been expressed with these 
p n explicitly. However, the expression has many ± contributions and it is not an easy task to evaluate them for 
N > 100. In addition, the range of parameters p and p are also restricted and one cannot study the large correlation 
regime. Furthermore, for p — 0.5 case, the distribution does not have the Z2 symmetry as P/v(n) = Pn(N — n). The 
distribution has irregular shape and for some choice of parameters, it shows singular rippling. 

In this paper, we propose a general method to construct correlated binomial models (CBM) based on the consistent 
conditions on the conditional probabilities and the conditional correlations. With the method, it is possible to study the 
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correlation structure for any probability distribution function for exchangeable correlated Bernoulli random variables. 
The organization of the paper is as follows. In section \n\ we introduce conditional probabilities p^ and conditional 
correlations and show how to construct CBMs. We prove that the construction is self-consistent. In addition, in 
order to assure the probability conservation or the normalization, the conditional correlations and the probabilities 
should satisfy self-consistent relations. We also calculate the moments < n k > of the model. In the course, we 
introduce a linear operator H which gives the joint probabilities in the "binomial" expansion of (p + q) . Section 
IIII is devoted to some solutions of the self-consistent relations. We obtain the beta-binomial distribution (BBD) 
with strong symmetric assumptions on the conditional correlations. For other probability distribution functions 
which include the Witt's model and the distributions constructed by the superposition of the binomial distributions 
(Bernoulli mixture model) , we calculate p^ and pij . We study the probability distribution functions for these solutions 
from the viewpoint of their correlation structures p^j . We conclude with some remarks and future problems in section 
lYl 



II. CORRELATED BINOMIAL MODELS AND THEIR CONSTRUCTIONS 
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FIG. 1: Pascal's triangle like representation of Xij and Pij,qtj up to i + j < 2. Xoo =< 1 >, Xio =< X\ >= p, Xoi =< 
1 — Xi >= 1 — p = q etc. 

In this section, we construct the joint probabilities and the distribution functions of CBMs. We introduce the 
following definitions. The first one is the products of Xi and 1 — Xj and they include all observables of the model. 



IF 



i+j 



11 x, [] (i-*;0 

i' = l j'-i+l 



(9) 



The following definitions are their unconditional and conditional expectation values (see Figure QJ). 



Xij — < Tlij > 



Pij 



< X 



1 > = 



X 



i+lj 



i .v.. , II, 



1 >= 



X 



(10) 

(11) 

(12) 



Xoo = 1; ^io = V an d ^oi = 1 ~ P — <?■ Furthermore, the relation pij + qij = 1 should hold for any because 
of the identity < l|IIjj = 1 >=< Xj+j+i + (1 — Xi+j+i)\Tlij = 1 >= 1. All informations are contained in Xij. The 
joint probability P{x\, Xi, ■ ■ • ,Xn) with X^=i x v = n i s given by X„jv_„ and the distribution function Pjy(n) is also 
calculated as 



PN( n ) — nC u ■ X nN _ n . 



(13) 



In order to estimate Xij, we need to calculate the products of pm and qu from (0,0) to As the path, we can 

choose anyone and the product must not depend on the choice. This property is guaranteed by the next condition on 
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Pij and qij as 



X. 



Qi+lj ■ Pi 3 — Pij+l ■ Qij 



i+lj+l 



(14) 



In order for p^ and qij to satisfy these conditions, we introduce the following conditional correlations 

CoTT(X i+ j +1 ,Xi +j+2 \Uij = 1) = Pij- (15) 

We set poo = P- (1 — X i+ j +1 ) and (1 — X i+ j +2 ) are also correlated with the same strength and the following relations 
hold. 



Corr((l - X i+j+1 ), (1 - X i+j+ 2)\Ilij = 1) = Pij. 
From these relation, we obtain the recursive relations for p^ and q^ as 

Pi+ij = Pij + (1 ~ Pij) Pij 

Qij+1 = Qij : + (1 — Qij) Pij- 



(16) 



(17) 



If we assume the identity + = 1, we obtain q^ = 1 — p^, cfo+ij = 1 — Pi+ij = (1 — Pij)(l ~ Pij) an d 
Pij+x = 1- qij+i = - Pij)- Then g i+lj - ■ p %j = p lj+1 ■ q^ = Py(l -Py)(l - py) holds and we see that the above 
consistency relation l|14fl does hold. 

P.. 
ij 




i J+l 



X 



i+1 j+l 



FIG. 2: Proof of the commutation relation qt+ij ■ pij = Pij+i • qij- 

The remaining consistency relations or the probability conservation identity is p^ + q^ — 1. We prove the identity 
by the inductive method. For i = j = 0, the identity holds trivially as poo + Qao = p + q = 1- For j = or i = 0, 
(?io and Poj ar< 3 calculated as <?io = 1 — Pm and pcy = 1 — qoj and the identity also holds trivially. Then we assume 
Pij -i + Qij— l = 1 an d prove the identity py + = 1. From the recursive equations l|17|l on pij and q^, we have the 
following relations. 



1 = Pi 3 + qij 

= Pi-lj + (! ~Pi-lj)pi-lj + 0-~Pij-l)+Pij-lPij-l- 



(18) 



For the identity to be satisfied, the conditional correlation py-i and Pi-ij must satisfy the following relations. 

Pi-ij -Pij-i = -(1 - Pi-ij)Pi-ij - Pij-\Pij-\- (19) 



If the conditional correlations py are fixed so as to satisfy the relations, the model becomes self-consistent. In other 
words, it guarantees the normalization of the resulting probability distribution. 




FIG. 3: Picture for the Pij + qij = 1 condition. 



We estimate the moments of CBM. For the purpose, we introduce following operators H and Dj~. The former one 
is a linear operator H which maps polynomial in p, q to joint probabilities G R. By its linearity, we only need to fix 
its action on monomial p l q J as 



H[p l q 3 ] = pooPio • ■ -Pi-wqioqa ■ ■ ■ Qij-i- 



(20) 



The joint probability X n N- n is expressed as X n N- n = H\p n q n ]. Here we choose the far left path from (0,0) to 
(n, N — n) on the Pascal's triangle (See Figure^). The action of H on the binomial expansion (p + q) N = 1 N can be 
interpreted as the probability distribution and its normalization condition. 



N 



N 



1 = H[1 N ] = H[(p + q) N ] = Y J NC n - H[p n q N - n ] =Y,NC n - X nN ^ n (21) 

n=Q n=0 

In order to calculate the moments of CBM, it is necessary to put n k in the above summation. Instead, we will put 
n(n — l)(n — 2) • • • (n — k + 1) and introduce the following differential operators D^. 



Dk = ^2 PhOPi 2 ■ ■ 'Pi, n- 

0<ii,ia,—<N-l 



The action of Dk on X n 7v-n for n > k is 

D k X nN - n = n(n - l)(n - 2) • • • (n - k + l)X nN _ n . 
On the other hand, the same expression can be obtained as 

H\p k —rp n q N - n ] = H[n(n - l)(n - 2) • • • (n - k + l)p n q N - n ] = n(n - l)(n - 2) •• • On - k + 1)X., 
dp" 

This relation defines the action of Dk on the operator H with any polynomial f(p, q) as 

jk 

DkH[f(p, q )}=H[p k ^f(p, q )}. 



nN—n ■ 



(22) 



(23) 



(24) 



(25) 



6 



The calculation of the expectation value of n(n — 1) • • • (n — k + 1) is performed by the action of operator Dk on the 
binomial expansion of fl" [1^] = H[(p + q) ]■ 



N 



D k H[{p + q) N ] = nCu ■ D k X nN - n (26) 

n=0 

The right hand side is nothing but the expectation value < n(n — l)(n — 2) • • • (n — k + 1) >. The left hand side is 
calculated by using ea. l(23|l as 

D k H[(p + q) N ] = H[p k -^(p + q) N ] = N(N - l)(N - 2) ■ ■ • (N - k + l)H[p k (p + q) N ~ k ] 
= N(N - 1)(N -2)---(N-k+ l)H[p k ] = N(N - 1)(N - 2) • • • (N - k + l)p oPioP20 • ■ -Pk-W. (27) 
We obtain the relation, 

< n{n - l)(n - 2) ■ • • (n - fc + 1) >= N(N - l)(N -2)---(N-k+ l)pooPiop 2 o • ■ -Pk-w (28) 
From the relation, we can estimate the moments of CBM. 



III. BETA-BINOMIAL DISTRIBUTION AND OTHER SOLUTIONS 



In the previous section, we have derived self-consistent equations for pij and pij. They are summarized as 

Pi+ij = Pi] + (1 -Pij) Pij (29) 

Pij+l = Pij-PijPij (30) 
Pi-lj —Pij-1 = "(I - Pi-lj)pi-lj - Pij-lPij-1- (31) 

In this section, we show several solutions to these equation. We note, if one knows joint probabilities Xy, from the 
definitions for p^ and , we can estimate p^ . Then are estimated from the recursive equation l|29|l . In addition, 
we interpret the behaviors of the solutions from the viewpoint of correlation structures. 



A. Beta-binomial Distribution 



In order to solve the above relations on and p^, we use the symmetry viewpoint. For P — \ case, the model 
should have particle-hole duality between X and 1 — X or Z2 symmetry Then p^j = pji should hold. We put stronger 
assumption that for any p, the system has the Z2 symmetry and pij depends on i,j only through the combination 
n = i + j. With a suitable choice of indexes i — > i + 1 and j = n — i, ea. (|31|l reduces to 

Pin-i — Pi+ln-i-l = Pn(— 1 + Pin-i — Pi+ln-i- 1 ) ■ (32) 

From this relation, we see that p^ with the same n = i + j consist a arithmetic sequence with the common difference 

Pi+ln-i— 1 — Pin-i = A„. (33) 

A n = Pn (l + A n ). (34) 



A n satisfy the following equation 
p n can be solved with A n as 



r 1 + A„ 

From the relation (|29|l for , we obtain the following recursive relation for p n as 

A„ A n _l(l-/J n _x) p n -i 



(35) 



1 + A n 1 + A n _i(l -pn-l) l+Pn-l' ^ 
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The explicit form for p n and A n are 



Pn = — and A„ = p n _ x . (37) 

1 + np 



Then p^ and q^ can be obtained explicitly and the results are 

p(l-p) + ip 

Pij = p i+j0 = 1 + { . + ._ 1)p (38) 

q « = 1 -^ = 1+i i +j - 1)p - ^ 

X n N-n are then obtained by taking the products of producing these conditional probabilities from (0, 0) to (n, N — n) 



n-1 N-n-1 
i=0 j=0 



Putting the above results for pij and qij into them, we obtain 

ngcga -p)+ jp) ngg^cga - pk j» 

XnN - n = rr^-Vi 171 — 777 ' 1 ' 

lL=o C 1 + ( fc - 1 >P) 

Here q = 1 — p. By multiplying the binomial coefficients nCu, we obtain the distribution function Pj\r(n) as 

PN(n) — nCu ■ X nN _ n . (42) 

This distribution is nothing but the beta-binomial distribution function (see eq.(|3J)) with suitable replacements (jp, p) «-> 
(a, (3) . 



B. Moody's Correlated Binomial Model 



In the original work by Witt, he assumed pi t o = p for alii We call this model as Moody's Correlated Binomial 
(MCB) model. The above consistent equations are difficult to solve and the available analytic expressions are those 
for P io as P io = 1 — (1 — — p)\ With the result, we only have a formal expression for Xij as 

i i+j 

Xa = < n y - >=< J] x v H (i - x?) > 

V—X j'=i+l 
3 i+k j 

= J2(-l) k jC k < J] X v >= J2(-l) k jCk ■ Pl+ ko- (43) 

k=Q i'=l fc=0 

With this expression, it is possible to estimate p^ ,qij and p^ from their definitions. However, equation (|43|l contains 
jCk{— 1) and as TV becomes large, it becomes difficult to estimate them. With the above choice for p. L Q = p, it is 
possible to set N = 30. If pny damps as exp(— Xi) with some positive A, we can set at most N — 100 for small values 
of p and p. 



C. Mixed Binomial Models: Bernoulli Mixture Models 



Bernoulli mixture model with some mixing probability distribution function /(p), the expression for the joint 
probability function is calculated with 

X ij =<U ij >= [ dpf{p)p\\-p)*. (44) 
Jo 

If we use the beta distribution for f(p), we obtain ea. l|41|) . However, this does not mean that it is trivial to solve 
the consistent equations with the assumption />y = pi + j and obtain the BBD. The consistent equations completely 
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determine any correlated binomial distribution for exchangeable Bernoulli random variables. Every correlated binomial 
distributions obey the relations. With the assumption p.y = pi+j, we are automatically lead to the BBD. That is, the 
probability distribution with the symmetry pij = pt+j, we prove that it is the BBD. No other probability distribution 
has the symmetry. 

Here we consider the relation between CBM and Bernoulli Mixture model. According to De Finetti's theorem, the 
probability distribution of any infinite exchangeable Bernoulli random variables can be expressed by a mixture of the 
binomial distribution |10| . CBM in the N —* oo limit should be expressed by such a mixture. From ea . (|44|l . we have 
the relation P(x\ = 1, x 2 = 1, • • • , at* = 1) = X k0 = J f{p)p k dp. X k0 is expressed as X k0 = pooPio ■ • -Pfc-io, we have 
a correspondence between the moments of f(p) and a CBM. That is, if one knows pio for any i, we know the mixing 
function /(p) and vice versa. This correspondence shows the equivalence of CBM and the Bernoulli mixture model 
in the large N limit. But CBMs with finite N can describe probability distribution more widely. In the Bernoulli 
mixture model, the variance of p is positive and the correlation p cannot be taken negative. In CBM, we can set p 
negative for small system size N. In addition, CBM is useful to construct the probability distribution and discuss 
about the correlation structure. Particularly we can understand the symmetry of the solution. For example, we want 
to have Z2 symmetry distributions. In the Bernoulli mixture model, we need to impose on /(p) as 

/(p)(p-0.5) 2fc+1 dp = 0, (45) 





where k = 1, 2, • • • . On the other hand, in CBM, we only need to seek a solution with pn — qn = i. This simple 
constraint is useful in the construction and in the parameter calibration of CBMs. 

As other mixing functions f(p), we consider the cases which correspond to the long-range Ising model with some 
strength of magnitude of correlation p > 0. It has some correlation only in the regime where the probability distribution 
for the magnetization p(m) has two peaks at mi,m2 for T < T c 9J. If the system size N is large enough, the 
distribution can be approximated with the superposition of two binomial distributions. If we take N — > 00 for T < T c , 
the system loses its ergodicity and the phase space breaks up into two space with m > and m < and the 
correlation disappears. Even if there appears two peaks in p(m), only one of them represents the real equilibrium 
state. 

The precise values of mi and 777,2 depend on the model parameters, we consider the cases which correspond to 
p = 0.5 (Z2 symmetric case) and p ~ 0. For the Z2 symmetric case, there is no external field and mi = —ma holds. 
Between the Bernoulli random variable X and the Ising Spin variable S, there exists a mapping X = |(1 — S). f(p) 
has two peaks at p and q = 1 — p with the same height. On the other hand, for T ~ and infinitely weak positive 
external field case ~ O(jt), p(m) has one tall peak at mi ~ 1 and another short peak at ni2 — — 1. In the language 
of the Bernoulli random variable case, f(p) has a tall peak at p' — p" ~ and a short peak at p' ~ 1. We consider 
the following mixing functions and call them Two-Binomial models. 

• fip') = W -p) + W - with 1 = 1 - p- 

This mixing function corresponds to the long-range Ising model with Z2 symmetry and p > 0. Xjj are given as 

*tf = ^(pV'+pV). (46) 



Pij and pij are calculated easily as 

p'+V +j>y+ 1 

//(/' • />•'</' ^ ' 

= p'+y+^p-g) 2 

™ p*+jq*+j( p 2 +g 2) + qp (p2i q 2 3 +q 2i p 2jy ( > 

This solution has the Z2 symmetry pij = pjt. 

fip') = ^S(P' -P) + ^S(p' - q) with q = 1 - p. 

This is the modified version of the above solution with a parameter k = 0, 1, • • ■ . If we set k = 0, it is nothing 
but the above solution. X^j are given as 

Xij = ^- li {p l q 3 p k + pW). (49) 
p K + q K 
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Pij and pij are 



P 



i+k+l n j 



q> +p>q l+k+1 



Pa pi+kqi + piq i + k 

p i +' + fc g i +J + fc (p-q) 2 

PH ~ pi+j+k q i+j+k(j ) 2 +q 2\_i rqp fp2i+2k q 2j +q 2i+2kp2jy ( > 

If we denote C\ — pl ? +q k , C 2 — p k q +q k , then the mixing function becomes f(p') = Ci8{p' —p) + C 2 <5(p' — q) . This 
solution may look trivial. One obtain this solution using the parallel shift of the above solution H4ti[l . We replace 
Xij with Xi + kj in ea. (|46|l and obtain the solution. Such a parallel shift may give birth to another solution, we 
would like to note it here. 

. f{p') = (l-a)8{p> -p") + a5{p' -I). 

This mixing function corresponds to the long-range Ising model without Z 2 symmetry, < Si >~ 1 and p > 0. 
We call the model as Binomial plus (B+) model, because it is a binomial distribution plus one small peak at 
n = N. Between p, p and p", a, we have the relations 

, ,/ , call — p") 

p = a + (l-a)p" and p = \ p ( - (52) 

a + (1 — ajp" 



and 

pp 



(53) 



1 — p + pp 
Xij are given as 

Xi S = (1 - a)p"*(l - p"y + ad jfi (54) 

Pij and pij are calculated easily as 

a + (1 — a)p" i+1 

P'° = m V7h~ and Pa = P for 3 ' + (55) 

a + (1 — a)p' n 

and 

a(l - p") 

P'O = ~ + (1 _ a)p „ t+ i and PU = ° for 3*0. (56) 



D. Correlation Structures of the Solutions 



In this subsection, we study the relations between probability distributions and correlation structure. Figure 01 
shows the probability distribution profiles for three correlated models, MCB, BBD and Two-Binomial models. We set 
p = 0.5, p = 0.3 and N = 30. We also shows the pure binomial distribution for comparison. The former three curves 
have the same p and p, however their profiles are drastically different. Two binomial model with Zi symmetry has 
two peaks and their overlaps decreases as N increases. At the thermodynamic limit N — * oo, the overlap disappears 
and the system loses its ergodicity. The long-range Ising models shows spontaneous symmetry (SSB) breaking of 
the Z2 symmetry . On the other hand, the BBD's profile is broad and even if we set N — *■ 00, we obtain the beta 
distribution and the shape is almost unchanged. That is, the BBD system does not show SSB and it maintain its Z2 
(particle-hole) symmetry at p = 0.5. 

The profile of MCB model is peculiar. It is not symmetric and shows singular rippling. The origin for the ripping 
can be understood from the inspection of its correlation structure. Figure |3] shows the correlation structures for the 
above three models. The parameters are equal and we show p-sao-i- In contrast to the BBD's correlation, which is 
constant with i+j fixed, the correlations for MCB has sharp peak at i = 30 and show strong rippling structure. The 
curve is not symmetric and the distortion is reflected in the shape of its probability distribution. On the other hand, 
the correlation curve for Two-binomial distribution has a strong peak at i = y and is it much different from the 
BBD's correlation curve. This strong peak and rapid decay may be reflected in the decomposition of the probability 
distribution. However, we have not yet understood the relation well. 
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FIG. 4: Probability distribution P3o(n) for p — 0.5, p = 0.3 and N = 30. We show 3 distributions, MCB (solid line), beta- 
binomial (dotted line) and Two-binomial (thin dotted line). We also show a binomial distribution (p = 0.0) for comparison. 
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Two Binomial Z 2 - 
Beta-Binomial - 



FIG. 5: Correlation pi3o-i for MCB (solid line), BBD (thin dotted line) and Two-Binomial (dotted line) models. We set 
p = 0.3 and p — 0.5 as in the previous figure. 



Figure shows the probability distribution for MCB, BBD and B+ models. We set p — 0.1, p — 0.3 and N = 30. 
We also shows the pure binomial distribution for comparison. MCB and BBD have almost the same bulk shape, 
however MCB has a small peak at n = 30. B+ has more strong peak at n — 30 and its bulk shape can be obtained by 
a small left shift of the pure binomial distribution p = 0.1. These profile differences are reflected in their correlation 
structures. See Figure[3 It shows the correlation structures for the above three models. The parameters are equal as 
in the previous figure. Contrary to the constant BBD structure, MCB and B+ models have a peak at i = 30. MCB 
has a small and B+ has a tall peak and the difference is reflected int the size of their tail peak of the probability 
distributions. 
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FIG. 6: Probability distribution P 3 o(n) for p = 0.1, p = 0.3 and N = 30. We show 3 distributions, MCB (solid line), 
beta-binomial (dotted line) and B+ (thin dotted line). We also show a binomial distribution (p — 0.0) for comparison. 
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FIG. 7: Correlation pizo-i for MCB (solid line), BBD (thin dotted line) and B+ (dotted line) models. We set p — 0.3 and 
p = 0.1. 



IV. CONCLUDING REMARKS AND FUTURE PROBLEMS 



In this paper, we show a general method to construct correlated binomial models. We also estimate their moments. 
Our method includes Witt's model and the BBD. In addition, with the consistent equations on pij and p%j , it is possible 
to prepare correlated binomial distributions with any choice for p^ or p,-Q. Of course, the resulting distribution function 
should be non-negative , 'any' should be taken with some care. In addition, from the joint probabilities Xij, it is 
possible to estimate p^ and pij . We can see the detailed structure of the system with any distribution function. In the 
work 0, the conditional strange failure probabilities p^ were studied. Some recursive relations on p i0 were proposed 
and the resulting conditional probabilities pio were compared with real data on server networks. We note that pio can 
be freely changed and it may be possible to make a good fitting with data. However, if the correlation structure p^ 
becomes too complex and it shows oscillation, such a modeling may be over-fitting. 

At last, we make comments about future problems. The first one is to seek another interesting solution to eci. (|29|l . 
eg . (I3UH and eq. l|31|) about p^ and pij. In this paper, we have assumed strong symmetry in pij in the derivation of 
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the BBD. For any value of p, we have assumed Zi symmetry pij = pji. Furthermore, we have assumed stronger 
constraint that p^ depends on i,j only through the combination i+j. The consistent relation is then solved easily 
and we get the BBD. However, we think that the correlated binomial distribution space is rich and there may exist 
other interesting solutions. We discuss some simple solutions which are superpositions of two binomial distribution. 
They try to mimic the long-range Ising model in the large N limit and p > A simple seamless solution for 
the consistent relations which correspond to the long-range Ising model may exist. Taking the continuous limit of 
the consistent relations and studying their solution is also an interesting problem. The solution space may become 
narrow, however differential equations are more tractable than the recursion relations. There should exist the beta 
distribution and the superposition of delta-functions, which are the continuous limits of the simple solutions presented 
here. 

The second problem is the generalization of the present method. In this paper, we have assumed that the Bernoulli 
random variables are all exchangeable. If one consider to apply the correlated binomial model to the real world, such 
an idealization should be relaxed. One possibility is the inhomogeneity in p and the other is the inhomogeneity in p. 
The first step is to add one other Bernoulli random variable Y to N exchangeable variable system. This N + 1 system 
case has been treated in jjj, it seems much difficult to introduce the self-consistent equations in the present context. 
However, such a generalization may lead us to find new probability distribution functions, we believe that it deserves 
for extensive studies. 
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